replay-based method
Improving Regret Approximation for Unsupervised Dynamic Environment Generation
Unsupervised Environment Design (UED) seeks to automatically generate training curricula for reinforcement learning (RL) agents, with the goal of improving generalisation and zero-shot performance. However, designing effective curricula remains a difficult problem, particularly in settings where small subsets of environment parameterisations result in significant increases in the complexity of the required policy. Current methods struggle with a difficult credit assignment problem and rely on regret approximations that fail to identify challenging levels, both of which are compounded as the size of the environment grows. We propose Dynamic Environment Generation for UED (DEGen) to enable a denser level generator reward signal, reducing the difficulty of credit assignment and allowing for UED to scale to larger environment sizes. We also introduce a new regret approximation, Maximised Negative Advantage (MNA), as a significantly improved metric to optimise for, that better identifies more challenging levels. We show empirically that MNA outperforms current regret approximations and when combined with DEGen, consistently outperforms existing methods, especially as the size of the environment grows. We have made all our code available here: https://github.
Reconciling λ-Returns with Experience Replay
Modern deep reinforcement learning methods have departed from the incremental learning required for eligibility traces, rendering the implementation of the λ-return difficult in this context. In particular, off-policy methods that utilize experience replay remain problematic because their random sampling of minibatches is not conducive to the efficient calculation of λ-returns. Yet replay-based methods are often the most sample efficient, and incorporating λ-returns into them is a viable way to achieve new state-of-the-art performance. Towards this, we propose the first method to enable practical use of λ-returns in arbitrary replay-based methods without relying on other forms of decorrelation such as asynchronous gradient updates. By promoting short sequences of past transitions into a small cache within the replay memory, adjacent λ-returns can be efficiently precomputed by sharing Q-values.
TS-ACL: A Time Series Analytic Continual Learning Framework for Privacy-Preserving and Class-Incremental Pattern Recognition
Fan, Kejia, Li, Jiaxu, Lai, Songning, Lv, Linpu, Liu, Anfeng, Tang, Jianheng, Song, Houbing Herbert, Yue, Yutao, Zhuang, Huiping
Class-incremental pattern recognition in time series is a significant problem, which aims to learn from continually arriving streaming data examples with incremental classes. A primary challenge in this problem is catastrophic forgetting, where the incorporation of new data samples causes the models to forget previously learned information. While the replay-based methods achieve promising results by storing historical data to address catastrophic forgetting, they come with the invasion of data privacy. On the other hand, the exemplar-free methods preserve privacy but suffer from significantly decreased accuracy. To address these challenges, we proposed TS-ACL, a novel Time Series Analytic Continual Learning framework for privacy-preserving and class-incremental pattern recognition. Identifying gradient descent as the root of catastrophic forgetting, TS-ACL transforms each update of the model into a gradient-free analytical learning process with a closed-form solution. By leveraging a pre-trained frozen encoder for embedding extraction, TS-ACL only needs to recursively update an analytic classifier in a lightweight manner. This way, TS-ACL simultaneously achieves non-forgetting, privacy preservation, and lightweight consumption, making it widely suitable for various applications, particularly in edge computing scenarios. Extensive experiments on five benchmark datasets confirm the superior and robust performance of TS-ACL compared to existing advanced methods. Code is available at https://github.com/asdasdczxczq/TS-ACL.
Reconciling λ-Returns with Experience Replay
Modern deep reinforcement learning methods have departed from the incremental learning required for eligibility traces, rendering the implementation of the λ-return difficult in this context. In particular, off-policy methods that utilize experience replay remain problematic because their random sampling of minibatches is not conducive to the efficient calculation of λ-returns. Yet replay-based methods are often the most sample efficient, and incorporating λ-returns into them is a viable way to achieve new state-of-the-art performance. Towards this, we propose the first method to enable practical use of λ-returns in arbitrary replay-based methods without relying on other forms of decorrelation such as asynchronous gradient updates. By promoting short sequences of past transitions into a small cache within the replay memory, adjacent λ-returns can be efficiently precomputed by sharing Q-values.
DS-AL: A Dual-Stream Analytic Learning for Exemplar-Free Class-Incremental Learning
Zhuang, Huiping, He, Run, Tong, Kai, Zeng, Ziqian, Chen, Cen, Lin, Zhiping
Class-incremental learning (CIL) under an exemplar-free constraint has presented a significant challenge. Existing methods adhering to this constraint are prone to catastrophic forgetting, far more so than replay-based techniques that retain access to past samples. In this paper, to solve the exemplar-free CIL problem, we propose a Dual-Stream Analytic Learning (DS-AL) approach. The DS-AL contains a main stream offering an analytical (i.e., closed-form) linear solution, and a compensation stream improving the inherent under-fitting limitation due to adopting linear mapping. The main stream redefines the CIL problem into a Concatenated Recursive Least Squares (C-RLS) task, allowing an equivalence between the CIL and its joint-learning counterpart. The compensation stream is governed by a Dual-Activation Compensation (DAC) module. This module re-activates the embedding with a different activation function from the main stream one, and seeks fitting compensation by projecting the embedding to the null space of the main stream's linear mapping. Empirical results demonstrate that the DS-AL, despite being an exemplar-free technique, delivers performance comparable with or better than that of replay-based methods across various datasets, including CIFAR-100, ImageNet-100 and ImageNet-Full. Additionally, the C-RLS' equivalent property allows the DS-AL to execute CIL in a phase-invariant manner. This is evidenced by a never-before-seen 500-phase CIL ImageNet task, which performs on a level identical to a 5-phase one. Our codes are available at https://github.com/ZHUANGHP/Analytic-continual-learning.
Recall-Oriented Continual Learning with Generative Adversarial Meta-Model
The stability-plasticity dilemma is a major challenge in continual learning, as it involves balancing the conflicting objectives of maintaining performance on previous tasks while learning new tasks. In this paper, we propose the recall-oriented continual learning framework to address this challenge. Inspired by the human brain's ability to separate the mechanisms responsible for stability and plasticity, our framework consists of a two-level architecture where an inference network effectively acquires new knowledge and a generative network recalls past knowledge when necessary. In particular, to maximize the stability of past knowledge, we investigate the complexity of knowledge depending on different representations, and thereby introducing generative adversarial meta-model (GAMM) that incrementally learns task-specific parameters instead of input data samples of the task. Through our experiments, we show that our framework not only effectively learns new knowledge without any disruption but also achieves high stability of previous knowledge in both task-aware and task-agnostic learning scenarios. Our code is available at: https://github.com/bigdata-inha/recall-oriented-cl-framework.
CORE: Mitigating Catastrophic Forgetting in Continual Learning through Cognitive Replay
Zhang, Jianshu, Fu, Yankai, Peng, Ziheng, Yao, Dongyu, He, Kun
This paper introduces a novel perspective to significantly mitigate catastrophic forgetting in continuous learning (CL), which emphasizes models' capacity to preserve existing knowledge and assimilate new information. Current replay-based methods treat every task and data sample equally and thus can not fully exploit the potential of the replay buffer. In response, we propose COgnitive REplay (CORE), which draws inspiration from human cognitive review processes. CORE includes two key strategies: Adaptive Quantity Allocation and Quality-Focused Data Selection. The former adaptively modulates the replay buffer allocation for each task based on its forgetting rate, while the latter guarantees the inclusion of representative data that best encapsulates the characteristics of each task within the buffer. Our approach achieves an average accuracy of 37.95% on split-CIFAR10, surpassing the best baseline method by 6.52%. Additionally, it significantly enhances the accuracy of the poorest-performing task by 6.30% compared to the top baseline.
Reconciling λ-Returns with Experience Replay
Daley, Brett, Amato, Christopher
Modern deep reinforcement learning methods have departed from the incremental learning required for eligibility traces, rendering the implementation of the λ-return difficult in this context. In particular, off-policy methods that utilize experience replay remain problematic because their random sampling of minibatches is not conducive to the efficient calculation of λ-returns. Yet replay-based methods are often the most sample efficient, and incorporating λ-returns into them is a viable way to achieve new state-of-the-art performance. Towards this, we propose the first method to enable practical use of λ-returns in arbitrary replay-based methods without relying on other forms of decorrelation such as asynchronous gradient updates. By promoting short sequences of past transitions into a small cache within the replay memory, adjacent λ-returns can be efficiently precomputed by sharing Q-values.